Search CORE

10 research outputs found

Recommended from our members

Seedability: optimizing alignment parameters for sensitive sequence comparison

Author: Ayad LAK
Chikhi R
Pissis SP
Publication venue: Oxford University Press (OUP)
Publication date: 12/08/2023
Field of study

Data availability: The data underlying this article are available either in https://github.com/lorrainea/Seedability or in the ensembl database at https://www.ensembl.org, and can be accessed using the gene names ENSPTRG00000044036 and ENSG00000174236 or in the NCBI database at https://www.ncbi.nlm.nih.gov and can be found using the reference sequence NC_000001.11.Motivation: Most sequence alignment techniques make use of exact k-mer hits, called seeds, as anchors to optimize alignment speed. A large number of bioinformatics tools employing seed-based alignment techniques, such as Minimap2⁠, use a single value of k per sequencing technology, without a strong guarantee that this is the best possible value. Given the ubiquity of sequence alignment, identifying values of k that lead to more sensitive alignments is thus an important task. To aid this, we present Seedability⁠, a seed-based alignment framework designed for estimating an optimal seed k-mer length (as well as a minimal number of shared seeds) based on a given alignment identity threshold. In particular, we were motivated to make Minimap2 more sensitive in the pairwise alignment of short sequences. Results: The experimental results herein show improved alignments of short and divergent sequences when using the parameter values determined by Seedability in comparison to the default values of Minimap2. We also show several cases of pairs of real divergent sequences, where the default parameter values of Minimap2 yield no output alignments, but the values output by Seedability produce plausible alignments. Availability and implementation: https://github.com/lorrainea/Seedability (distributed under GPL v3.0).R.C. was supported by ANR Full-RNA, SeqDigger, Inception, and PRAIRIE grants (ANR-22-CE45-0007, ANR-19-CE45-0008, PIA/ANR16-CONV-0005, ANR-19-P3IA-0001). This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreements No. 872539 (PANGAIA) and 956229 (ALPACA)

Brunel University Research Archive

String Sanitization: A Combinatorial Approach

Author: B Cazaux
CC Aggarwal
D Pissinger
J Gallant
M Crochemore
O Abul
R Grossi
SP Pissis
VS Verykios
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/04/2020
Field of study

Crossref

CWI's Institutional Repository

Fast Indexes for Gapped Pattern Matching

Author: D Knuth
G Navarro
J Bader
K Fredriksson
M Crochemore
M Lewenstein
M Morgante
P Bille
P Bille
Philip Bille
R Saikkonen
SP Pissis
T Crawford
T Haapasalo
U Manber
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/02/2020
Field of study

We describe indexes for searching large data sets for variable-length-gapped (VLG) patterns. VLG patterns are composed of two or more subpatterns, between each adjacent pair of which is a gap-constraint specifying upper and lower bounds on the distance allowed between subpatterns. VLG patterns have numerous applications in computational biology (motif search), information retrieval (e.g., for language models, snippet generation, machine translation) and capture a useful subclass of the regular expressions commonly used in practice for searching source code. Our best approach provides search speeds several times faster than prior art across a broad range of patterns and texts.Comment: This research is supported by Academy of Finland through grant 319454 and has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094

arXiv.org e-Print Archive

Crossref

Efficient Pattern Matching in Elastic-Degenerate Texts

Author: A Amir
A Dilthey
B Schieber
D Gusfield
DE Knuth
DM Church
E Ukkonen
EM McCreight
HT Harel
L Huang
MS Rahman
S Maciuca
SP Pissis
Y Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/02/2017
Field of study

Crossref

King's Research Portal

Subframe Temporal Alignment of Non-Stationary Cameras

Author: B Cazaux
CC Aggarwal
D Pissinger
J Gallant
M Crochemore
O Abul
R Grossi
SP Pissis
VS Verykios
Publication venue
Publication date: 01/01/2008
Field of study

This paper studies the problem of estimating the sub-frame temporal off-set between unsychronized, non-stationary cameras. Based on motion trajec-tory correspondences, the estimation is done in two steps. First, we propose an algorithm to robustly estimate the frame accurate offset by analyzing the trajectories and matching their characteristic time patterns. Using this result, we then show how the estimation of the fundamental matrix between two cameras can be reformulated to yield the sub-frame accurate offset from nine correspondences. We verify the robustness and performance of our approach on synthetic data as well as on real video sequences.

CiteSeerX

Crossref

CWI's Institutional Repository

King's Research Portal

String sanitization: a combinatorial approach

Author: B Cazaux
CC Aggarwal
D Pissinger
J Gallant
M Crochemore
O Abul
R Grossi
SP Pissis
VS Verykios
Publication venue: Springer LNCS
Publication date: 08/06/2019
Field of study

String data are often disseminated to support applications such as location-based service provision or DNA sequence analysis. This dissemination, however, may expose sensitive patterns that model confidential knowledge (e.g., trips to mental health clinics from a string representing a user’s location history). In this paper, we consider the problem of sanitizing a string by concealing the occurrences of sensitive patterns, while maintaining data utility. First, we propose a time-optimal algorithm, TFS-ALGO, to construct

Crossref

CWI's Institutional Repository

University of Birmingham Research Portal

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università di Pisa

King's Research Portal

Application and Algorithm:Maximal Motif Discovery for Biological Data in a Sliding Window

Author: A-C Leonard
AM Carvalho
CS Iliopoulos
G Pavesi
J van Helden
M Meijer
MS Waterman
N Pisanti
R Grossi
RS Fuller
S Sinha
SP Pissis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Crossref

King's Research Portal

Amborella trichopoda is strongly supported as the single living species of the sister lineage to all other extant flowering plants, providing a unique reference for inferring the genome content and structure of the most recent common ancestor (MRCA) of living angiosperms. Sequencing the Amborella genome, we identified an ancient genome duplication predating angiosperm diversification, without evidence of subsequent, lineage-specific genome duplications. Comparisons between Amborella and other angiosperms facilitated reconstruction of the ancestral angiosperm gene content and gene order in the MRCA of core eudicots. We identify new gene families, gene duplications, and floral protein-protein interactions that first appeared in the ancestral angiosperm. Transposable elements in Amborella are ancient and highly divergent, with no recent transposon radiations. Population genomic analysis across Amborella's native range in New Caledonia reveals a recent genetic bottleneck and geographic structure with conservation implications

Archivio istituzionale della ricerca - Università degli Studi di Udine